AITopics | attention link

Collaborating Authors

attention link

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Toward Mechanistic Explanation of Deductive Reasoning in Language Models

Maltoni, Davide, Ferrara, Matteo

arXiv.org Artificial IntelligenceOct-13-2025

Recent large language models have demonstrated relevant capabilities in solving problems that require logical reasoning; however, the corresponding internal mechanisms remain largely unexplored. In this paper, we show that a small language model can solve a deductive reasoning task by learning the underlying rules (rather than operating as a statistical learner). A low-level explanation of its internal representations and computational circuits is then provided. Our findings reveal that induction heads play a central role in the implementation of the rule completion and rule chaining steps involved in the logical inference required by the task. Introduction Recent Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning and problem-solving (Huang and Chang, 2023). Many approaches have focused on enhancing logical reasoning in LLMs, with a growing body of work introducing formal and symbolic logic-based benchmarks (Liu et al., 2025). While much of the literature emphasizes solving reasoning benchmarks, comparatively less attention has been devoted to understanding and explaining the underlying low-level computational mechanisms. Y et, interpretability is crucial for designing more robust and targeted models, that are less prone to errors.

large language model, natural language, residual stream, (16 more...)

arXiv.org Artificial Intelligence

2510.0934

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Generating Realistic Tabular Data with Large Language Models

Nguyen, Dang, Gupta, Sunil, Do, Kien, Nguyen, Thin, Venkatesh, Svetha

arXiv.org Artificial IntelligenceOct-29-2024

While most generative models show achievements in image data generation, few are developed for tabular data generation. Recently, due to success of large language models (LLM) in diverse tasks, they have also been used for tabular data generation. However, these methods do not capture the correct correlation between the features and the target variable, hindering their applications in downstream predictive tasks. To address this problem, we propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data. First, we propose a novel permutation strategy for the input data in the fine-tuning phase. Second, we propose a feature-conditional sampling approach to generate synthetic samples. Finally, we generate the labels by constructing prompts based on the generated samples to query our fine-tuned LLM. Our extensive experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks. It also produces highly realistic synthetic samples in terms of quality and diversity. More importantly, classifiers trained with our synthetic data can even compete with classifiers trained with the original data on half of the benchmark datasets, which is a significant achievement in tabular data generation.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.21717

Country:

Oceania > Australia (0.14)
North America > United States > California (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

New developments in Machine Translation part3

#artificialintelligenceFeb-15-2023, 19:15:08 GMT

Abstract: Deep neural networks have been shown to be vulnerable to small perturbations of their inputs, known as adversarial attacks. In this paper, we investigate the vulnerability of Neural Machine Translation (NMT) models to adversarial attacks and propose a new attack algorithm called TransFool. To fool NMT models, TransFool builds on a multi-term optimization problem and a gradient projection step. By integrating the embedding representation of a language model, we generate fluent adversarial examples in the source language that maintain a high level of semantic similarity with the clean samples. Experimental results demonstrate that, for different translation tasks and NMT architectures, our white-box attack can severely degrade the translation quality while the semantic similarity between the original and the adversarial sentences stays high.

machine translation part3, semantic similarity, translation task, (7 more...)

#artificialintelligence

Genre: Research Report > New Finding (0.39)

Industry: Information Technology > Security & Privacy (0.81)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.39)

Add feedback

Attention Link: An Efficient Attention-Based Low Resource Machine Translation Architecture

Min, Zeping

arXiv.org Artificial IntelligenceFeb-1-2023

Transformers have achieved great success in machine translation, but transformer-based NMT models often require millions of bilingual parallel corpus for training. In this paper, we propose a novel architecture named as attention link (AL) to help improve transformer models' performance, especially in low training resources. We theoretically demonstrate the superiority of our attention link architecture in low training resources. Besides, we have done a large number of experiments, including en-de, de-en, en-fr, en-it, it-en, en-ro translation tasks on the IWSLT14 dataset as well as real low resources scene on bn-gu and gu-ta translation tasks on the CVIT PIB dataset. All the experiment results show our attention link is powerful and can lead to a significant improvement. In addition, we achieve a 37.9 BLEU score, a new sota, on the IWSLT14 de-en task by combining our attention link and other advanced methods.

attention link, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2302.0034

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback